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■ Abstract 



What is the 'level of excellence' of a scientist and the real impact of his/her work upon the scientific 
thinking and practising? How can we design a fair, an unbiased metric - and most importantly - a 
metric robust to manipulation? 



(N 

^ '. Quantifying an individual's scientific merit 

Q , The evaluation of the scientific work of a scientist has long attracted significant interest, due 

to the benefits by obtaining an unbiased and fair criterion. A few years ago such metrics 
were yet another topic of investigation for the scientometric community with only theoretical 
!>• ' importance, without any practical extensions. 

Very recently though the situation has dramatically changed; an increasing number of 
academic institutions are using such scientometric indicators to decide faculty promotions. 
("^ • Automated methodologies have been developed to calculate such indicators [6]. Also, fund- 

ing agencies use them to allocate funds, and recently some governments are considering the 
' consistent use of such metrics for funding distribution. For instance, the Australian govern- 

OO , ment has established the Research Quality Framework (RQF) as an important feature in the 

' fabric of research in Australia^; the UK government has established the Research Assessment 

Exercise (RAE) to produce quality profiles for each submission of research activity made by 
k>( \ institution^. 
; I ' The use of such indicators to characterize a scientist's merit is controversial, and a plethora 

5^ I of arguments can be stated against their use. In his recent article, David Parnas [5] described 

the negative consequences to the scientific progress caused by the "publish or perish" marathon 
run by all scientists. 

Following the reasoning of the phrase attributed to A. Einstein that "Not everything that 
can be counted counts, and not everything that counts can be counted." , we stress that the 
assessment of a scientist is a complex social and scientific process that is difficult to narrow it 
into a single scientometric indicator. Most of the times, the verbal descriptions of a scholar's 
quality is probably the best indicator. Though, the expressive and descriptive power of num- 
bers (i.e., scientometric indicators) can not unthinkingly be ignored; instead of devaluing them, 
we should strive to develop the "correct set" of indicators and, most importantly, to use them 
in the right way. 

No matter how skeptical is someone against the use of such indicators, the impact of a 
scholar can quite safely be described in terms of the acceptance of his /her ideas by the wider 
scientific community that s/he belongs to. Traditionally, this acceptance is measured by the 
number of authored papers and/or the number of citations. The early metrics are based on 



^http:/ /www. uts.edu.au/research/policies/resdata/RQF. html 
^http:/ /www. rae.ac.uk 
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some form of (arithmetics upon) the total mimber of authored papers, the average mimber of 
authored papers per year, the total number of citations, the average number of citations per 
paper, the mean number of citations per year, the median citations per paper (per year) and 
so on. Due to the power-law distribution followed by these metrics, they present one or more 
of the following drawbacks (see also [4]): 

• They do not measure the impact of papers. 

• They are affected by a small number of "big hits" articles. 

• They have difficulty to set administrative parameters. 

J. E. Hirsch attempted to collectively overcome all these disadvantages and proposed a 
pioneering metric, the now famous h-index [4] . h-index index was a really path-breaking idea, 
and inspired several research efforts to cure various deficiencies of it, e.g., its aging-ignorant 
behaviour [9]. 

Nevertheless, there is a latent weakness in all scientometric indicators developed so far, 
either those for ranking individuals or those for ranking publication fora, and the h-index is 
yet another victim of this complication. The inadequacy of the indicators stems from the 
existence of what we term here — for the first time in the literature — the scientospam. 

The notion of scientospam 

With a retrospective look, we sec that one of the main technical motivations for the introduc- 
tion of the h-index, was that the metrics used until then (i.e., total, average, max, min, median 
citation count) were very vulnerable to self-citations, which in general are conceived as a form 
of "manipulation". In his original article, Hirsch made specific mention about the robustness 
of the h-index with respect to self-citations and indirectly argued that h-index can hardly be 
manipulated. Indeed h-index is more robust than traditional metrics, but it is not immune to 
them [7]. Actually, none of the existing indicators is robust to self-citations. In general, the 
issue of self-citations is examined in many studies, e.g., [3], and the usual practise is to ignore 
them when performing scientometric evaluations, since in many cases it may account for a 
significant part of a scientist's reputation [1]. 

At this point, we argue that there is nothing wrong with self-citations; they can effectively 
describe the "authoritativeness" of an article, e.g., in the cases that the self-cited author is 
a pioneer in his/her field and s/he keeps steadily advancing his/her field in an step- by-step 
publishing fashion, until gradually other scientists discover and follow his/her ideas. 

In the sequel we will exhibit that the problem is much more complex and goes beyond 
self-citations; it involves the ground meaning of a citation. Consider for instance the citing 
patterns appearing in Figure 1. 



Article-1 is cited by three other papers (the ovals) and these citing articles have been 
authored by (strictly) discrete sets of authors, i.e., {ai, 02}, {as, 04} and {05, Oe}, respectively. 
On the other hand, Article-2 is cited by three other papers which all have been authored by the 
same author {ai}. Notice that we make no specific mention about the identity of the authors 
of Article-1 or Article-2 with respect to the identity of the authors Oj; some of the authors of 




Figure 1. Citing extremes: (Left) No overlap at all. (Right) Full overlap. 
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the citing papers may eoincide with those of the cited articles. Our problem treatment is more 
generic than self-citations. 

While we have no problem to accept that Article- 1 has received three citations, we feel 
that Articlc-2 has received no more than one citation. Reasons to have this feeling include for 
instance the heavy influence of Article-2 to author ai combined with the large productivity 
of this author. Nevertheless, considering that all authors ai to ag have read (have they?) 
Article-1 and only one author has read Articlc-2, it seems that the former article has a larger 
impact upon the scientific thinking. On the one hand, we could argue that the contents of 
Article-2 are so sophisticated and advanced that only a few scholars, if any, could even grasp 
some of the article's ideas. On the other hand, for how long could such situation persist? If 
Article-2 is a significant contribution, then it would get, after some time, its right position in 
the citation network, even if the scientific subcommunity to which it belongs is substantially 
smaller that the subcommunity of Article-1. 

The situation is even more complicated if we consider the citation pattern appearing in 
Figure 2, where there exist overlapping sets of authors in the citing papers. For instance, 
author as is a coauthor in all three citing papers. 




Figure 2. Citing articles with author overlap. 

This pattern of citation, where some author has coauthored multiple papers citing another 
paper is the spirit of what is termed in this article the scientometric spam or scientospam. 
The term spam is used in another two cases; it defines malicious emails (e-mail spam) and also 
Web links (link spam) that attempt to mislead the search engines when the engines exploit 
some form of link analysis ranking. Whereas the word spam has received a negative reputation 
representing malicious behaviour, we use it here as a means to describe misinformation. 

Apparently, there exists no prior work on combating scientospam; the closest relevant 
works include techniques to filter self-citations or weigh multi-author self-citations [7, 8]. Our 
target is to develop a metric of scientific excellence for individuals that will be really robust to 
scientospam. We firmly believe that the exclusion of self-citations is not a fair action; neither 
is any form of ad hoc normalization. Each and every citation has its value, the problem is to 
quantify this value. 

The notion of scientospam leads naturally to the process of the discovery of spam,m,ing 
patterns and their "controlled discount" . If we look more carefully at the citation data, we can 
gain a deeper knowledge and thus produce a fairer and more robust evaluation. A more careful 
look implies that we have to pay some more computational cost than that for simple indicators, 
like h-index, but in general we are willing to pay it, since the evaluation is an offline process. 
On the other hand, we have to avoid time-consuming and doubtful clustering procedures and 
special treatment of self-citations, so as to maintain the indicators' simplicity and beauty. 

The /-index 

We consider the citing example shown in Figure 2 where an article, say A, is cited by three other 
articles and let us define the quantity nca'^ to be equal to the number of articles citing article 
A. We define the series of sets F/^ = {aj : author aj appears in exactly i articles citing A}. 
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For the case of article ART-3, wc have that = {05, ag, ay}, = {ai, 02, 04}, F^ = {03}. 

Then, wc define /-^ to be equal to the ratio of the cardinality of F-^ to the total number 
of distinct authors citing article A, i.e., f,- = — , „' j ' — — . These quantities constitute the 

° ^ T J I total #distinct authore ^ 

coordinates of a nca-^-dimensional vector Z"^, which is equal to /-^ = {/f^, /2*, /a*, • • • , /^a^}- 

A. A 

The coordinates of this vector define a probability mass, since sumf^i fr = I. For the 
above example of the cited article ART-3, we have that /-^at-s _ |3 3 i| gij^iiiarly, for 
the cited article ART-1, we have that /^^'^-i = {|,|,§} and for ART-2, we have that 

fART-2 _ rO 1-1 
J li> i> IJ- 

Thus, we have converted a scalar quantity, i.e., the number of citations that an article has 
received, into a vector quantity, i.e., f-^, which represents the penetration of ^'s ideas — and 
consequently of its author(s) — to the scientific community; the more people know a scholar's 
work, the more significant s/he is. In general, these vectors are sparse with a lot of O's after 
the first coordinates. The sparsity of the vector reduces for the cited articles which have only 
a few citations. Naturally, for successful scholars we would prefer the probability mass to 
be concentrated to the first coordinates, which would mean that consistently new scientists 
become aware of and use the article's ideas. As the probablity mass gets concentrated on the 
coordinates near the end of /-^, the "audience" gets narrower and it implies the existence of 
cliques, and/or citations due to minimum publishable increment, as they are both described 
by Parnas [5] . 

Though, working with vectors is complicated and a single number would be the preferred 
choice. At this point, we can exploit a "spreading" vector, say s, to convert vector / into a 
single number through a dot-product operation, i.e., f = f ■ s. For the moment will use the 
plainest vector defined as si = {nca,nca — 1, . . . , 1}; other choices will be presented in the 
sequel. Thus, for the example article ART-3 which we are working with, we compute a new 
decimal number characterizing its significance, and this number is equal to = ■ si = 
f*3+f*2 + i*l = i^^iVj^w 2.28. 

The /-index. Now, we can define the proposed f-index in a spirit completely analogous to 
that of h-index. To compute the f-index of an author, we calculate the quantities N^* for each 
one of his/her authored articles A4 and rank them in a non- increasing order. The point where 
the rank becomes larger than the respective Afj^* in the sorted sequence, defines the value of 
f-index for that author. 

The spreading vector. Earlier, we used the most simple spreading vector; different such 
vectors can disclose different facts about the importance of the cited article. Apart from si, 
we propose also a couple of easy-to-conceive versions of the spreading vector. The vector 
52 = {nca, 0, . . . , 0} lies at the other extreme of the spectrum with respect to Si. Finally, if 
we suppose that the last non-zero coordinate of /-^ is then we have a third version of 
the spreading version defined as S3 = {nca, nca — ^^,nca — ^*^'^° , . . . , 1}. For each one of 
these spreading vectors, we define the respective f-index as f^^, fs2, and /^j. None of these 
three versions of the spreading vector, and consequently of the respective indexes, can be 
considered superior to the other two. They present merits and deficiencies in difference cases. 
For instance, the fg^ index does not make any difference for large h-index values; for scientists 
with h-index smaller than 15, the obtained fs^ index can be as much as 50% of the respective 
h-index. 

Validation 

As we stressed right from the beginning of the article, when it comes to characterize the entire 
professional life of a scholar with a single number, things get really complicated. The validation 
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of the usefulness of the proposed indexes is not an easy task, given our respect to the principle 
that "not everything that can be counted counts" . This article aims at introducing the notion 
of scientospam and proposing method to combat it. The comments made in this article should 
not harm the reputation and will not reduce the contributions of any mentioned scientist. We 
selected as input data to apply our ideas a number of computer scientists with high h-index 
(http://www.cs.ucla.edu/~palberg/h-number.html), who are beyond any question top-quality 
researchers. 

Since the data provided by the aforementioned URL are not up-to-date and also they are 
faulty, we cleansed them first, we kept the scientists with h-index larger than 30. The ranking 
in non- increasing h-index is illustrated in Table 1. 



Ilr II Scientist- h ||r || Scientist- h ||r || Scientist- h | 



1 


Hector Garcia-Molina-77 


17 


Odcd Goldreich-48 


22 


Carl Kesselman-42 


2 


Jiawci Han-66 


17 


Philip S. Yu-48 


24 


Olivier Faugeras-41 


3 


Ian Fostcr-65 


17 


Prabhakar n.aghavan-48 


25 


Teuvo Kohonen-40 


4 


Robert Tarjan-64 


17 


Leslie Lamport-48 


25 


Amit Sheth-40 


5 


Rakesh Agrawal-62 


17 


Douglas C. Schmidt-48 


25 


Craig Chambers-40 


6 


.Irniiifrr Wiflnm-fiO 


Ifi 


Mirharl I, .Jordan -47 




Drniotn Terzopoulos-40 


6 


l,1t -ll.'llklT 


1-^ 


l.),,ii;,l<l E. Knurli 17 




D.iVld A. l"'Ml1rl>nll-4n 


7 


.Jeffroy D, Ullmaii r,U 


i« 


Ronald Fagm 17 




I'iiiUp \Vaaier-40 


8 


Deborah Estrin-5S 


18 


Micha Sharir-47 


25 


Jose Meseguer-40 


9 


David Culler-56 


19 


H. V. Jagadish-46 


25 


George Karypis-40 


9 


Amir Pnueli-56 


19 


Mihir Bellare-46 


2€ 


Geoffrey B. Hinton-39 


IC 


Richard Karp-55 


19 


Pat Hanrahan-46 


2t 


Stefano Ceri-39 


ir 


Serge Abitcboul-55 


19 


Garcia Luna Aceves-46 


2t 


Leonard Kleinrock-39 


11 


David J. DeWitt-54 


2f 


Michael Franklin-45 


26 


Saul Greenberg-39 


11 


David E. Goldberg-54 


2( 


Alex Pentland-45 


26 


Judea Pearl-39 


12 


Anil K. Jain-53 


2( 


Martin Abadi-45 


26 


David Dill-39 


13 


Hari Balakrishnan— 53 


2( 


Andrew Zisserman-45 


27 


Vern Paxson-38 


13 


Randy H. Katz-52 


2( 


Thomas A. Henzinger-45 


27 


John A. Stankovic-38 


14 


Takeo Kanade— 52 


2C 


Vipin Kumar-45 


27 


Krithi Ramamritham-38 


14 


R.ajccv Motwani-51 


2C 


Nancy Lynch-45 


27 


Ramcsh Govindan-38 


15 


Don Towt;loy-50 


21 


Christos Faloutso.s-44 


27 


Jon Kleinberg-38 


15 


Chr. H, Papadimitriou-50 


21 


Thomas S. Huang-44 


2i 


Al. Sangiovanni-Vincentelli-37 


15 


Sebastian Thrun-5n 


21 


Sally Floyd-44 


2£ 


Edmund M. Clarke-37 


15 


Jack Dongarra-Sn 


21 


Robin Milner-44 


2£ 


Herbert Edelsbrunner-36 


15 


Ken Kennedy-50 


21 


Won Kim-44 


2E 


Richard Lipton-36 


16 


Didier Dubois-49 


22 


M. Frans Kaashoek-43 


2!; 


Ronald L. Rivest-36 


16 


Lixia Zhang-49 


22 


Kai Li-43 


2!; 


Willy Zwaenepoel— 36 


16 


Michael J. Carey-49 


22 


Monica S. Lam-43 


2E 


Jason Cong-36 


16 


A-Iichacl Stoncbraker-49 


22 


Sushil Jajodia-43 


3C 


Victor Basili-35 


16 


Moshe Y. Vardi-49 


22 


Rajeev Alur-43 


3r 


Mario Gcrla-35 


16 


Diwid S. ,TohnKon-49 


23 


Raghu Ramakrishnan-42 


3r 


Andrew S. Tanenbaum-35 


16 


Ben Shneiderman-49 


23 


Barbara Liskov-42 


31 


Maja Mataric-33 


16 


W, Briu-e Croft 19 




Tomaso Pos;s[,io-42 


::v^ 


Jnlin ]\Ic<:":ai-tliy-32 


17 


_Mih;,li;- V.ain.ik;,ki ■ 




\"iri.-r Liv--,T L2 




D.ivid Il.uu- dpr-32 


17 


Ivliruii Livny l.s 




Joseph Gogncn -12 


■..>,:.>. 


Stanley i:jshcr-31 


17 


Luca Cardelli-48 


23 


Henry Levy— 42 


35 


Tim Finin-31 



Table 1. Computer scintists' ranking based on h-index. 



Then, we applied the new indicators fs^ and fs^ and the results appear in Table 2. Both 
indicators cause changes in the ranking provided by the h-index. As expected, the values of 
the index are significantly different than the respective h-index values. It is important 
to note, that these differences (and their size) appear in any position, independently of the 
value of the h-index. If these differences concerned only the scientists with the largest h-index, 
then we could (safely) argue that for someone who has written a lot of papers and each paper 
has received a large number of citations, then some overlap citations and some self-citations 
are imavoidable. This is not the case though, and it seems that there is a deeper, latent 
explanation. 

Seeking this explanation, we calculated the differences in ranking positions for each scientist 
when ranked with h-index versus when they are ranked with the The results are illustrated 
in Table 3. 

The general comment is that the scientists who climb up the largest number of positions 

•^Scientists with the same h-index, have the same ranking position. For instance, J. Widom and S. Shenker 
each is ranked 6-th in the h-index ranking. The same holds for the ranking based on ■ 
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r 


Scientist— ~/sq 
J 1 L 




Scientist— 
J i L 


r 


Scientist— ~fs':> 
J Lj^ 1 


^ 


Hector Giirciii-Molina— 68 — 74 


17 


Donald E. Knuth— 41 — 45 


21 


Geoffrey E. Hinton— 37 — 37 


2 


Jiawei Han— 57 - 63 


17 


Philip S. Yu— 41 — 46 


2^ 


Teuvo Kohonen— 36 — 39 


2 


Ian Poster- 57 - 62 


IS 


Miron Livny— 40 — 45 


2Z 


Andrew Zisserman- 36 — 41 


3 


Robert Tarjan- 56 — 61 


IS 


Luca Cardelli— 40 — 46 


2^ 


Sushil Jajodia- 36 - 41 


4 


Scott Shenker- 54 — 59 


IS 


Ronald Fagin— 40 - 45 


25 


Joseph Goguen— 35 — 40 


5 


Jennifer Widom— 53 — 58 


16 


H. V. Jagadish— 40 — 44 


25 


Rajeev Alur- 35 — 41 


5 


Jeffrey D. Ullman— 53 — 55 


IS 


Didier Dubois— 40 — 44 


25 


Philip Wadler- 35 - 38 


6 


David Culler— 52 — 53 


1? 


Alex Pentland— 40 — 43 


25 


Amit Shcth- 35 - 39 


7 


Deborah Estriii— 51 — 56 


1? 


Thomas S. Huang— 40 — 42 


25 


Nancy Lynch- 35 - 42 


7 


Rakesh Agrawal— 51 — 60 


IS 


Sally Floyd— 40 — 43 


25 


Leonard Kleinrock- 35 - 38 


8 


David E. Goldberg— 50 — 52 


1£ 


R.obin Milncr— 40 — 42 


2c 


Vern Paxson- 35 - 37 


9 


Richard Karp— 49 — 55 


IS 


M. Franhi Kaashock— 40 — 41 


25 


John A. Stankovic- 35 - 37 


IC 


David J. DeWitt— 48 — 51 


16 


Carl Kesselman— 40 — 42 


24 


Saul Greenberg- 34 - 37 


IC 


Hari Balakrishnan— 48 — 52 


1£ 


Moshe Y. Vardi— 39 — 46 


24 


Stefano Ceri- 34 - 37 




Anil K. Jain— 47 — 50 


1£ 


Martin Abadi— 39 — 43 


24 


Raghu Ramakrishnan— 34 — 40 


11 


Amir Pnueli— 47 — 52 


If 


Christos Faloutsos— 39 — 43 


24 


Krithi Ramamritham- 34 - 38 


11 


Takeo Kanadc— 47 — 50 


IE 


Mihalis Yannakaki.s— 39 — 46 


24 


Jon Kleinberg- 34 - 36 




Randy H. Katz— 46 — 51 




Mihir Bellarc— 39 — 45 




Ramesh Govindan- 33 - 36 


IS 


Lixia Zhang- 46 - 48 


IE 


Oded Goldrcich- 39 - 45 


25 


Edmund M. Clarke- 33 - 34 


12 


Don Towslcy- 45 - 49 


is; 


Garcia Luna Accvcs- 39 - 43 


26 


Judca Pearl- 32 - 36 


13 


HvT'Ai; Abitrlioiil- 45 ■ ,'2 


19 


K;ii Li 39 11 




Richard Lipton- 32 -- 


H5 


i:', 


l.);i\ia .l,)lin-,-n ir> 


1'. 


L;;,rl,,aM Li -k.n 3i) In 




lii.nald L l";i\iv.1 32 


i 


LI 


Ken Kennedy -11 19 


19 


TuuiJLSo l\ji4,i4,io 39 li 




\"ictor Basili 32 




14 


Rajeev Motwani- 44 - 48 


IE 


Henry Levy- 39 - 40 


2€ 


Andrew S. Tanenbaum- 32 - 34 


14 


Sebastian Thrun- 44 - 48 


It 


Michael Franklin- 39 - 42 


2e 


David Haussler- 32 - 34 


14 


Ben Shneiderman- 44 - 48 


2C 


Won Kim- 38 - 42 


27 


Jose Meseguer- 31 - 37 


14 


Prabhakar Raghavan- 44 - 46 


2( 


Monica S. Lam- 38 - 42 


27 


David Dill- 31 - 35 


15 


W. Bruce Croft- 43 - 46 


2C 


Vipin Kumar- 38 - 41 


27 


Willy Zwacncpocl- 31 - 34 


15 


Chr. H. Papadimitriou- 43 - 47 


21 


Victor Lesser- 37 - 41 


26 


Al. Sangiovanni-Vincentelli- 30 - 34 


15 


Michael I. Jordan- 43 - 46 


21 


Thomas A. Henzinger- 37 — 43 


26 


Mario Gerla- 30 - 33 


le 


Michael Stonebraker- 42 — 45 


21 


Micha Sharir- 37 - 43 


2C 


Herbert Edelsbrunner- 29 - 34 


le 


Jack Dongarra- 42 - 48 


21 


Olivier Faugeras- 37 — 40 


2C 


Tim Finin- 29 - 30 


16 


Leslie Lamport— 42 — 45 


21 


Craig Chambers- 37 - 40 


3C 


Jason Cong- 28 - 33 


le 


Douglas C. Schmidt- 42 - 46 


21 


Demetri Terzopoulos- 37 - 38 


31 


Maja Mataric— 27 — 30 


u 


Michael J. Carey- 42 - 46 


21 


David A. Patterson- 37 - 39 


31 


Stanley Osher- 27 - 31 


u 


Pat Hanrahan— 42 — 44 


21 


George Karypis— 37 - 38 


3: 


John McCarthy- 26 - 29 



Table 2. Computer scintists' ranking based on /gj. The fs^ value is represented too. 



are those whose work can "penetrate" (and thus benefit) large "audiences". For instance, the 
research results by Lixia Zhang and John A. Stankovic, who work on sensors now, are cited in 
communities like databases, networking, communications. Other scientists whose works is used 
by large audiences arc those working on "computer organization", e.g., M. Frans Kaashoek, 
Barbara Liskov, Andrew S. Tanenbaum, etc. Notice here, that scientists' age has nothing to 
do with the ranking relocation, since both younger researchers (e.g., Lixia Zhang) can climb 
up positions, just like elder scientists (e.g., Andrew S. Tanenbaum). 

Another important question concerns whether the particular area of expertise of a re- 
searcher could help him/her acquire a larger reputation. Undoubtedly, the research area plays 
its role, but it is not the definitive factor. Consider for instance, the case of data mining which 
is a large area and has attracted an even larger number of researchers. We sec that George 
Karypis has earned four positions in the ranking provided by ■ If the area of expertise was 
the only rational explanation for that, then why Rakesh Agrawal, who founded the field, is 
among the scientists that lost the most number of positions in the ranking provided by /s2? 
The answers lies in the particularities of the research subfields; George Karypis contributed 
some very important results useful also in the field of bioinformatics. To strengthen this, 
we can mention the case of Jiawei Han. He is a data-mining expert whose work penetrates 
to communities like mining, databases, information retrieval, artificial intelligence, and his is 
ranked second, based either on h-index, or on fs^ or on fs^ . 

Examining the scholars with the largest loses, we see that scientists who have made ground- 
breaking contributions and offered some unique results, e.g., Mihalis Yannakakis, and Moshe 
Y. Vardi, drop in the ranking provided by the ■ This has nothing to do with the theoretical 
vs. practical sides of the computer science; contrast the cases of M. Yannakakis and M. Vardi, 
versus A. Zisserman and R. Agrawal. It is due to the nature of the scientific results that do 
not "resound" to other communities. 
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Scientist— h h-rank earned pos. in fg. 

David HausHlcr-32 32 +6 

Carl Kesselman-42 23 +5 

Geoffrey E. Hinton-39 26 +5 

Lixia Zhang-49 16 +4 

M. Frans Kaashoek-43 22 +4 

Barbara Liskov-42 23 +4 

Tomaso Poggio-42 23 +4 

Henry Lcvy-42 23 +4 

Craig ChambcrB-40 25 +4 

Demetri Tcrzopouloa-40 25 +4 

David A. PattcrBon-40 25 +4 

George Karypis-40 25 +4 

Vern Paxson-38 27 +4 

John A. Stankovic-38 27 +4 

Victor Basili-35 30 +4 

Andrew S. Tanenbaum-35 30 +4 

Tim Finin-31 33 +4 

Table 3. Largest relocations w.r.t. rank position, (left) Most positions up. (Right) Most positions down. 

Discussion 

When measuring science we should always have in mind the principle which says that "not 
everything that can be counted counts" . On the other hand, wc believe in the power of 
numbers and we side with Lord Kelvin which stated that "When you can measure what you 
are speaking about, and express it in numbers, you know something about it. But when you 
cannot mcasiirc it, when you cannot express it in numbers, your knowledge is of a meager 
and unsatisfactory kind: It may be the beginning of knowledge, but you have scarcely, in your 
thoughts, advanced to the stage of science." 

We argue that instead of anathematizing each and every scicntomctric indicator, wc should 
strive to develop the correct set of them. David Parnas did an excellent job in recording a 
number of existing and significant problems with current publication methodologies. Along 
the spirit of his ideas, we describe for the first time here, another dimension of publication 
methodologies, the existence of scientospam and set forth an effort to discover the spamming 
patterns in citation networks. 

The astute reader will have realized by now that in our battle against the scientospam, 
wc have in our arsenal the research works dealing with Web link spam [2], e.g., TrustRank, 
BadRank and so on. Unfortunately, the situation is radically difficult in citation networks, 
because they consist of entities richer than the Web pages and the Web links encountered 
in Web spam. Each node i.e., a citing article, in a citation network consists of entities i.e., 
co-authors, which form a complex overlay network above the article citation network. 

We believe that the detection of spamming patterns in citation networks is quite a difficult 
procedure, and the cooperation of the authors is mandatory. Maybe the scientific community 
should set some rules about citing, rules not only ethical, but practical as well. For instance, 
we could have sections in the "References" section of each published article, to describe which 
citations involve only relevant work, which citations refer to earlier work done by the authors of 
the article, which citations refer to works implemented as competing works in the article, and 
so on. Apart from these organizational categories, others could be devised as well; whether the 
citing article's results contradict or support the results of the cited articles and many other. 

In any case, we believe that scientometric indicators are not a panacea, and we should work 
a lot before applying a set of them to characterize the achievements of a scholar. Indicators 
do have their significance, but some methodologies, both ethical and practical should change 
in order to have reliable and automated measurements of science. 

References 

[1] J. H. Fowler and D. W. Aksnes. Does self-citation pay? Scientometrics, 72(3):427-437, 2007. 



Scientist- h 


h-rank 


lost pos. in /so 




Rakesh Agrawal— 62 


5 


-2 


Amir Pnueli-56 


9 


-2 


Didier Dubois-49 


16 


-2 


Mihalis Yannakakis— 48 


17 


-2 


Odcd GoIdrcich-48 


17 


-2 


Andrew Zi.sscrman-45 


20 


-2 


,Iosc A'Icscgucr-40 


25 


-2 


Serge Abitcboul— 55 


10 


-3 


Mohihc Y. Vardi-49 


16 


-3 


Micha Sharir-47 


18 


-3 


Nancy Lynch— 45 


20 


-3 



8 



[2] Z. Gyongyi and H. Garcia-Molina. Spam: It's not just for inboxes anymore. IEEE Computer, 

pages 28-34, October 2005. 

[3] I. Hellsten, R. Lambiotte, and A. Scharnhorst. Self-citations, co-authorships and keywords: A new 
approach to scientists' field mobility. Scientometrics, 72(3):469-486, 2007. 

[4] J. E. Hirsch. An index to quantify an individual's scientific research output. Proceedings of the 
National Academy of Sciences, 102(46):16569-16572, 2005. 

[5] D. T. Parnas. Stop the numbers game: Counting papers slows down the rate of scientific progress. 
Communications of the ACM, 50(11):19-21, 2007. 

[6] J. Ren and R. N. Taylor. Automatic and versatile publications ranking for research institutions 
and scholars. Communications of the ACM, 50(6) :81 85, 2007. 

[7] M. Schreiber. Self-citation corrections for the Hirsch index. Europhysics Letters, 78(3), 2007. 

[8] A. Schubert, W. Glanzel, and B Thijs. The weight of author self-citations. A fractional approach 
fo self-citation counting. Scientometrics, 67(3):503-514, 2006. 

[9] A. Sidiropoulos, D. Katsaros, and Y. Manolopoulos. Generalized Hirsch h-index for disclosing 
latent facts in citation networks. Scientometrics, 72(2):253-280, 2007. 



